Recognition apparatus, recognition method, and computer program product

ABSTRACT

According to an embodiment, a recognition apparatus includes one or more processors. The one or more processors are configured to calculate, based on the input signal, a score vector sequence in which a plurality of score vectors each including respective scores of symbols are arranged; and cause, among: a first score vector in which a representative symbol corresponding to a best score is a recognition-target symbol; a second score vector in which a representative symbol is a non-target symbol, and a score of the representative symbol is worse than a first threshold; and a third score vector in which a representative symbol is a non-target symbol, and a score of the representative symbol is equal to the first threshold or better than the first threshold, a third score vector satisfying a predefined first condition, to pass through to filter the score vector sequence.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims the benefit of priority fromJapanese Patent Application No. 2016-224033, filed on Nov. 17, 2016; theentire contents of which are incorporated herein by reference.

FIELD

Embodiments described herein relate generally to a recognitionapparatus, a recognition method, and a computer program product.

BACKGROUND

There has been known a recognition apparatus that recognizes patterns ofinput signals, and converts input signal into a symbol sequence. Forexample, there have been known a speech recognition apparatus thatrecognizes speech signals, an optical character recognition (OCR)apparatus that recognizes characters from an image, and the like. Insuch recognition apparatuses, input signals are divided for each frame,and score calculation and symbol sequence search are performed for eachdivided frame.

Meanwhile, there exists a recognition apparatus that introduces a symbolrepresenting that information included in an input signal is not arecognition target, and skips search processing when the score of thesymbol is large enough. Because such a recognition apparatus skipsprocessing for searching for a symbol that is not a recognition target,calculation costs can be reduced.

Nevertheless, in conventional recognition apparatuses, if symbols thatare not recognition targets are skipped too much, a recognition ratedeclines in some cases.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating a configuration of a recognitionapparatus according to an embodiment;

FIG. 2 is a diagram illustrating an example of a configuration of ascore calculator;

FIG. 3 is a diagram illustrating a processing flow of a searching unit;

FIG. 4 is a diagram illustrating an example of a symbol sequenceretrieved by the searching unit;

FIG. 5 is a diagram illustrating processing of deleting arecognition-target symbol of consecutive recognition-target symbols fromthe symbol sequence illustrated in FIG. 4;

FIG. 6 is a diagram illustrating processing of deleting symbols otherthan recognition-target symbols from the symbol sequence illustrated inFIG. 5;

FIG. 7 is a diagram illustrating a processing flow of a filtering unit;

FIG. 8 is a diagram illustrating a first example of a score vectorsequence obtainable before processing is performed by the filteringunit, and a score vector sequence obtainable after the processing isperformed by the filtering unit;

FIG. 9 is a diagram illustrating a second example of a score vectorsequence obtainable before processing is performed by the filteringunit, and a score vector sequence obtainable after the processing isperformed by the filtering unit;

FIG. 10 is a diagram illustrating an example of a pseudo-coderepresenting processing performed by the recognition apparatus; and

FIG. 11 is a hardware block diagram of the recognition apparatus.

DETAILED DESCRIPTION

According to an embodiment, a recognition apparatus is for performingpattern recognition of an input signal being a recognition target. Therecognition apparatus includes one or more processors. The one or moreprocessors are configured to calculate, based on the input signal, ascore vector sequence in which a plurality of score vectors eachincluding respective scores of symbols are arranged, and cause a partialscore vector of the calculated score vector sequence to pass through tofilter the score vector sequence. The one or more processors areconfigured to cause, among: a first score vector in which arepresentative symbol corresponding to a best score is arecognition-target symbol; a second score vector in which arepresentative symbol is a non-target symbol, and a score of therepresentative symbol is worse than a first threshold; and a third scorevector in which a representative symbol is a non-target symbol, and ascore of the representative symbol is equal to the first threshold orbetter than the first threshold, a third score vector satisfying apredefined first condition, to pass through to filter the score vectorsequence.

An embodiment will be described in detail below with reference to thedrawings. A recognition apparatus 10 according to the present embodimentaccurately recognizes patterns of input signals with small calculationcosts, and outputs a recognition result of the input signals.

The recognition apparatus 10 recognizes information represented by aninput signal, and outputs a recognition result. The input signal may beany signal as long as the signal includes pattern-recognizableinformation. Examples of the input signal include a speech signal, asignal representing handwriting, an image signal representing acharacter, a moving image signal representing gesture such as signlanguage, and the like.

First of all, terms used in the embodiment will be described.

A symbol represents pattern-recognizable information included in aninput signal. For example, when the input signal is a speech signal, thesymbol represents acoustic information included in the speech signal.

In addition, the acoustic information includes linguistic information.The linguistic information included in the acoustic information isinformation representable by characters that is added to the speechsignal by a speaker speaking a language. For example, the linguisticinformation included in the acoustic information is a phoneme, asyllable, phonemes combined for each mora, a sub-word, a character, aword, and the like. In the case of Japanese, the linguistic informationmay be kana. In addition, in the case of English, the linguisticinformation may be a phonetic symbol or an alphabet.

In addition, the acoustic information may include paralinguisticinformation and nonlinguistic information. The paralinguisticinformation is information unidentifiable from the linguisticinformation that is added to the speech signal by a speaker producing asound. For example, the paralinguistic information is a fillerindicating that a speaker is thinking, and the like. The nonlinguisticinformation is information representing features of a speaker that isincluded in the speech signal. For example, the nonlinguisticinformation is gender of the speaker, age of the speaker, physicalfeatures of the speaker, and the like.

In addition, the acoustic information may include silent information.The silent information is information representing a state in which noneof the linguistic information, the paralinguistic information, and thenonlinguistic information is included in the speech signal (e.g.,silence and noise).

A symbol set is a set constituted by symbols each serving as an element.The symbol set is predefined. The symbol set includes, as symbols, atleast one recognition-target symbol, and a non-target symbol.

The recognition-target symbol is a symbol representing information to berecognized by the recognition apparatus 10, among pieces of informationincluded in an input signal. For example, if the input signal is aspeech signal, the symbol set may include, as recognition-targetsymbols, characters corresponding to all pieces of linguisticinformation that can be included in the speech signal (e.g., allphonetic symbols). In addition, if the recognition apparatus 10recognizes only a specific word (e.g., recognizes only “good”), thesymbol set may include, as recognition-target symbols, characterscorresponding to linguistic information necessary for recognizing thespecific word. In addition, when paralinguistic information,nonlinguistic information, and/or silent information are/is used asrecognition targets, the symbol set may include, as one ofrecognition-target symbols, a symbol representing paralinguisticinformation, nonlinguistic information and/or silent information.

The non-target symbol is a symbol representing that it is undeterminedwhich piece of information among information pieces represented byrecognition-target symbols is included in an input signal. In otherwords, the non-target symbol is a symbol representing that therecognition apparatus 10 cannot recognize recognition-target symbols atthe present stage. More specifically, the non-target symbol is a symbolrepresenting that processing of a below-described score calculator 26determining which recognition-target symbol is to have a good score issuspended. The score of the non-target symbol becomes better when theprocessing is suspended, and becomes worse when the processing is notsuspended. Thus, as described below, when a good score is calculated forthe non-target symbol when an input signal corresponding to one frame isinput, the input signal sometimes corresponds to a part or all ofrecognition target information pieces.

A symbol sequence is a series of likely symbols obtained by recognizingan input signal. The recognition apparatus 10 may generate one symbolsequence for one input signal. In addition, the recognition apparatus 10may generate M (M is an integer of two or more) symbol sequences for oneinput signal.

An output symbol represents a recognition result of an input signal. Inthe case of recognizing a speech signal, the output symbol may be aword, a character, a sub-word sequence, and the like. The output symbolis generated based on a recognition-target symbol included in a symbolsequence. The recognition apparatus 10 may generate a plurality ofoutput symbols arranged in chronological order, from one symbolsequence. A plurality of output symbols arranged in chronological orderis sometimes called an output symbol sequence.

FIG. 1 is a diagram illustrating a configuration of the recognitionapparatus 10 according to the embodiment. The recognition apparatus 10includes a feature extractor 22, a pattern recognition model storage 24,the score calculator 26, a filtering unit 28, a search model storage 30,and a searching unit 32.

The feature extractor 22 acquires a recognition target input signal. Forexample, the feature extractor 22 acquires a speech signal as an inputsignal.

The feature extractor 22 analyzes an input signal for each frame, andcalculates a feature vector for each frame. The feature vector includesa plurality of types of feature amounts representing features ofinformation included in the input signal. For example, when the inputsignal is a speech signal, the feature vector includes a plurality oftypes of feature amounts representing features of speech. The frame is asegment of an input signal for calculating one feature vector. The frameis set so that a center time shifts at every predetermined interval. Inaddition, a plurality of frames have time lengths identical to oneanother, for example. The segment of each frame may partially overlapthat of another frame.

The pattern recognition model storage 24 stores a pattern recognitionmodel. The pattern recognition model is data used by the scorecalculator 26 for performing pattern recognition of an input signal. Thepattern recognition model is appropriately trained by a learning devicein advance of the recognition of the input signal that is performed bythe recognition apparatus 10. For example, the pattern recognition modelstorage 24 may be realized by a server on a network. In addition, whenthe recognition apparatus 10 performs speech recognition, the patternrecognition model storage 24 stores an acoustic model.

Based on the feature vector calculated by the feature extractor 22 foreach frame, the score calculator 26 calculates a score vector sequencein which a plurality of score vectors are arranged, using the patternrecognition model stored in the pattern recognition model storage 24.The score vectors include the respective scores of symbols each being anelement of a predefined symbol set. For example, when the recognitionapparatus 10 performs speech recognition, the score vectors includerespective acoustic scores of the symbols.

The respective scores included in the score vectors correspond to any ofsymbols. Each score represents likelihood of information represented bya corresponding symbol being included in an input signal. For example,an acoustic score represents likelihood of acoustic informationrepresented by a corresponding symbol being included in a speech signal.In addition, frame synchronization (temporal synchronization) needs notbe performed between information included in an input signal, andinformation represented by a symbol. In other words, the informationrepresented by the symbol may be later than the information included inthe input signal. For example, when the input signal is a speech signal,among acoustic scores included in score vectors calculated by the scorecalculator 26 based on an input of a feature vector of the 15th frame,acoustic information represented by an input symbol corresponding to thebest acoustic score may be included in any of the first to tenth frames.

The score vectors are normalized so that the combination of all scoresincluded therein equals a specific value (e.g., 1). For example, whenthe scores represent probability or likelihood, the score vectors arenormalized so that the addition of all scores included therein equals aspecific value. In addition, when the scores represent logarithmicprobability or logarithmic likelihood, the score vectors are normalizedso that exponential calculation of each of the scores included therein,and subsequent addition of all the resultant scores produce a specificvalue.

For example, a score may represent probability, likelihood, logarithmiclikelihood, or logarithmic probability of information represented by acorresponding symbol being included in an input signal. A larger valueof the score may indicate a better score (i.e., more likely), or asmaller value may indicate a better score. For example, when the scorerepresents probability, likelihood, logarithmic probability, orlogarithmic likelihood, a larger value of the score indicates a betterscore. In addition, for example, when the score represents logarithmicprobability with a reversed sign or logarithmic likelihood with areversed sign, a smaller value of the score indicates a better score. Inaddition, when the score represents some sort of distance between aninput signal (feature vector) and a pattern recognition model, a smallervalue of the score indicates a better score.

In addition, a symbol corresponding to the best score among a pluralityof scores included in the score vectors will be hereinafter referred toas a representative symbol. For example, when a larger score is better,a symbol corresponding to the largest score included in the scorevectors will be referred to as a representative symbol. In addition,when a smaller score is better, a symbol corresponding to the smallestscore included in the score vectors will be referred to as arepresentative symbol.

The score vector sequence is information in which a plurality of scorevectors are arranged. The score calculator 26 gives the calculated scorevector sequence to the filtering unit 28. In addition, the featureextractor 22 and the score calculator 26 correspond to a calculator thatcalculates a score vector sequence based on an input signal.

The filtering unit 28 receives the score vector sequence from the scorecalculator 26. The filtering unit 28 causes partial score vectors of thescore vector sequence calculated by the score calculator 26, to passthrough. In other words, the filtering unit 28 deletes partial scorevectors from the score vector sequence output from the score calculator26, and sequentially outputs the remaining score vectors.

More specifically, among the score vector sequence, the filtering unit28 causes a first score vector in which a representative symbol is arecognition-target symbol, and a second score vector in which arepresentative symbol is a non-target symbol and a score of therepresentative symbol is worse than a first threshold, to pass through.

Furthermore, among third score vectors in which representative symbolsare non-target symbols and scores of the representative symbols areequal to the first threshold or better than the first threshold that areincluded in the score vector sequence, the filtering unit 28 causes athird score vector satisfying the predefined first condition, to passthrough. In other words, the filtering unit 28 deletes third scorevectors not satisfying the first condition, from the third score vectorsincluded in the score vector sequence, and causes the other scorevectors to pass through.

For example, the filtering unit 28 determines any one or more and K−1 orless third score vectors of consecutive K (K is an integer of two ormore) third score vectors as the third score vector satisfying the firstcondition. In addition, for example, the filtering unit 28 may determineany one third score vector of the consecutive K third score vectors asthe third score vector satisfying the first condition.

In addition, alternatively, when a representative symbol included in animmediately preceding score vector of a partial vector sequenceconstituted by the consecutive K third score vectors, and arepresentative symbol included in an immediately following score vectorof the partial vector sequence are identical, the filtering unit 28 maydetermine one or more and K−1 or less third score vectors of the partialvector sequence as the third score vector satisfying the firstcondition.

The filtering unit 28 gives, to the searching unit 32, a score vectorsequence including passed score vectors. In addition, processingperformed by the filtering unit 28 will be further described withreference to a flow illustrated in FIG. 7.

The search model storage 30 stores a search model. The search model isdata used by the searching unit 32 for generating a symbol sequence andan output symbol sequence from the score vector sequence. The searchmodel is appropriately trained by a learning device in advance of therecognition of an input signal that is performed by the recognitionapparatus 10. For example, the search model storage 30 may be realizedby a server on a network.

The searching unit 32 receives the score vector sequence output from thefiltering unit 28. The searching unit 32 generates a symbol sequence bysearching for a symbol path that follows likely scores in the receivedscore vector sequence. The searching unit 32 may generate a symbolsequence using the search model stored in the search model storage 30.The symbol path is a series of symbols selected for each score vector.In addition, when the number of elements in the symbol set is denoted byx, and the length of the score vector sequence is denoted by y, thenumber of combinations possible as a symbol path is represented asx^(y). The searching unit 32 may directly store the symbol path as asymbol sequence, or may indirectly store the symbol path by referring tothe search model.

Furthermore, based on recognition-target symbols included in the symbolsequence, the searching unit 32 generates an output symbol representinga pattern recognition result of an input signal. The searching unit 32generates the pattern recognition result of the input signal bycombining consecutive identical recognition-target symbols on the pathinto one. In addition, for example, the searching unit 32 may generatethe pattern recognition result of the input signal by combiningconsecutive identical recognition-target symbols on the path into one,and then, excluding non-target symbols on the path. The searching unit32 may generate an output symbol using the search model stored in thesearch model storage 30.

The above-described searching unit 32 may generate, after generating asymbol sequence, an output symbol based on the symbol sequence. Inaddition, the searching unit 32 may collectively generate a symbolsequence and an output symbol. In addition, the searching unit 32 maygenerate one symbol sequence, or may generate M symbol sequences. Inaddition, from each symbol sequence, the searching unit 32 may generateone output symbol, or may generate a plurality of output symbolsarranged in chronological order.

For example, the search model used by the searching unit 32 is aweighted finite-state transducer (WFST). In this case, based on aViterbi algorithm, the searching unit 32 searches for a symbol path onwhich an accumulated value of scores becomes the best. In addition, thesearch model used by the searching unit 32 may be a recurrent neuralnetwork (RNN) or a network deriving from an RNN. By using such a searchmodel, the searching unit 32 can place restrictions on a path that canbe retrieved as a symbol path, specify a path to be preferentiallyretrieved in the searching, and specify a symbol sequence to bepreferentially generated even if the score is bad. Furthermore, thesearch model includes information representing correspondencerelationship between a symbol sequence and an output symbol. When thesearch model is a WFST, the searching unit 32 may store a symbol pathaccording to a path on the WFST, that is, a combination of a state andtransition of the WFST.

Then, the searching unit 32 outputs the generated output symbol as arecognition result of an input signal.

FIG. 2 is a diagram illustrating an example of a configuration of thescore calculator 26. As illustrated in FIG. 2, for example, the scorecalculator 26 may be a recurrent neural network (RNN) to whichConnectionist Temporal Classification (CTC) is applied.

For example, the score calculator 26 includes an input layer 42, atleast one intermediate layer 44, and an output layer 46. The input layer42, the intermediate layer 44, and the output layer 46 each executeacquisition processing of at least one signal, calculation processing ofthe acquired signal, and output processing of the at least one signal.

The input layer 42, the at least one intermediate layer 44, and theoutput layer 46 are connected in series. The input layer 42 receives afeature vector, and executes calculation processing. Then, the inputlayer 42 outputs at least one signal obtained as a calculation result,to the intermediate layer 44 on a subsequent stage. In addition, each ofthe intermediate layers 44 executes calculation processing on at leastone signal received from the preceding stage. Then, each of theintermediate layers 44 outputs at least one signal obtained as acalculation result, to the intermediate layer 44 on a subsequent stageor the output layer 46. Furthermore, each of the intermediate layers 44may have a returning path for returning a signal to itself.

The output layer 46 executes calculation processing on the signalreceived from the intermediate layer 44 on the preceding stage. Then,the output layer 46 outputs a score vector as a calculation result. Theoutput layer 46 outputs signals in the number corresponding to thenumber of symbols. Each signal output from the output layer 46 isassociated with a corresponding one of symbols. For example, the outputlayer 46 executes calculation using a softmax function.

In addition, parameters used by each layer for calculation processingare given from the pattern recognition model stored in the patternrecognition model storage 24. Based on the feature vector, the patternrecognition model is pre-trained by a learning device so as to outputthe respective scores of symbols included in a predefined symbol set. Inother words, the pattern recognition model is trained by the learningdevice so as to output the score of each of at least onerecognition-target symbol, and the score of a non-target symbol.

The score calculator 26 can thereby simultaneously output the respectivescores of the symbols included in a symbol set. In other words, thescore calculator 26 can simultaneously output the respective scores ofat least one recognition-target symbol and a non-target symbol.

In addition, in place of the RNN, the score calculator 26 may be anetwork called long short-term memory obtained by extending the RNN. Inaddition, in place of the softmax function, the output layer 46 may usea support vector machine (e.g., Yichuan Tang, “Deep Learning usingLinear Support Vector Machines”, arXiv: 1306.0239v4 [cs.LG], Feb. 21,2015).

FIG. 3 is a diagram illustrating a processing flow of the searching unit32. When generating the best one symbol sequence, the searching unit 32executes processing using a procedure as illustrated in FIG. 3.

First, in S11, the searching unit 32 acquires a score vector sequencefrom the filtering unit 28. Subsequently, in S12, based on the scorevector sequence, the searching unit 32 searches for a likely symbolpath, and generates one symbol sequence. For example, the searching unit32 may generate a symbol sequence by selecting, for each frame, a symbolhaving the best score, and connecting the selected symbols. In addition,for example, the searching unit 32 may generate a symbol sequence bysearching for the best path based on the Viterbi algorithm or the likeusing a search model such as a WFST.

Subsequently, in S13, among the symbol sequence, the searching unit 32detects a section in which a plurality of recognition-target symbol areconsecutively arranged, leaves any one of the plurality of consecutiverecognition-target symbols, and deletes the others. The searching unit32 can thereby prevent identical information (e.g., identical linguisticinformation) from being redundantly recognized.

For example, among the symbol sequence, the searching unit 32 leaves aleading one of the plurality of consecutive recognition-target symbols,and deletes the second and subsequent recognition-target symbols.Alternatively, among the symbol sequence, the searching unit 32 mayleave the last one of the plurality of consecutive recognition-targetsymbols, and delete the others.

Subsequently, in S14, among the symbol sequence processed in S13, thesearching unit 32 leaves recognition-target symbols, and deletesnon-target symbols. In other words, among the symbol sequence, thesearching unit 32 leaves only recognition-target symbols. The searchingunit 32 can thereby generate an output symbol based on therecognition-target symbols.

Subsequently, in S15, the searching unit 32 generates an output symbolfrom the symbol sequence processed in S13 and S14. In other words, thesearching unit 32 generates an output symbol from the symbol sequenceonly including recognition-target symbols.

For example, referring to a search model being a correspondence table ofa symbol sequence and an output symbol, the searching unit 32sequentially extracts, in order from a leading symbol of the symbolsequence, an output symbol matching a part of the symbol sequence. Forexample, the search model being a correspondence table of a symbolsequence and an output symbol may be a pronunciation dictionary in whicha phonetic symbol sequence and a word are associated. In addition, thesearching unit 32 may chronologically generate a plurality of outputsymbols from one symbol sequence.

In addition, the searching unit 32 may independently execute theprocesses in S12, S13, S14, and S15. In addition, when the search modelis a WFST, the searching unit 32 may collectively execute the processesin S12, S13, S14, and S15.

Subsequently, in S16, the searching unit 32 outputs each output symbolas a recognition result of an input signal.

In addition, the searching unit 32 may generate M symbol sequences. Inthis case, the searching unit 32 executes the processes in S12 to S15for each of the symbol sequences. In addition, when the search model isa WFST, by collectively executing the processes in S12 to S15, thesearching unit 32 can generate M symbol sequences.

FIGS. 4, 5, and 6 are diagrams for describing details of processingperformed by the searching unit 32 when alphabets are recognized. Whenalphabets are recognized from a speech signal according to theprocessing flow illustrated in FIG. 3, the searching unit 32 executesthe following processing.

In addition, a pattern recognition model (acoustic model) is pre-trainedby a learning device so as to recognize alphabets included in a symbolset. In addition, in many cases, recognition-target symbols are phonemesymbols. Nevertheless, in this example, the acoustic model is learned soas to recognize alphabets. Such a learning method is described in AlexGraves and Navdeep Jaitly, “Towards end-to-end speech recognition withrecurrent neural networks”, in Proceedings of the 31st InternationalConference on Machine Learning (ICML-14), pp. 1764-1772, 2014, forexample.

For example, in S12, the searching unit 32 generates a symbol sequenceas illustrated in FIG. 4. Here, for example, a predefined symbol set isassumed to be as follows.

symbol set={ε, d, g, o}

In addition, recognition-target symbols are assumed to be as follows.

set of recognition-target symbols={d, g, o}

In addition, a non-target symbol is assumed to be as follows. Inaddition, ε is a symbol representing that it is undetermined which pieceof acoustic information among acoustic information pieces represented byrecognition-target symbols is included in a speech signal.

non-target symbol=ε

In S13, among the symbol sequence, the searching unit 32 leaves aleading one of the plurality of consecutive recognition-target symbols,and deletes the second and subsequent recognition-target symbols. Forexample, in the example illustrated in FIG. 5, the third symbol and thefourth symbol are both “g”. In addition, the 13th symbol and the 14thsymbol are both “d”. Thus, in S13, the searching unit 32 leaves thethird symbol, and deletes the fourth symbol. In addition, the searchingunit 32 leaves the 13th symbol, and deletes the 14th symbol.

Subsequently, in S14, among the symbol sequence processed in S13, thesearching unit 32 leaves recognition-target symbols, and deletesnon-target symbols. For example, as illustrated in the example in FIG.6, the searching unit 32 deletes “ε” from the symbol sequence, andleaves “d”, “g”, and “o”.

Then, in S15, referring to a search model being a correspondence tableof a symbol sequence and an output symbol, from the symbol sequenceprocessed in S13 and S14, the searching unit 32 sequentially extracts,in order from a leading symbol of the symbol sequence, an output symbolmatching a part of the symbol sequence. For example, as illustrated inFIG. 6, the searching unit 32 generates “good” as an output symbol.

FIG. 7 is a diagram illustrating a processing flow of the filtering unit28. When the filtering unit 28 receives a score vector from the scorecalculator 26, the filtering unit 28 executes processing on the receivedscore vector using a procedure as illustrated in FIG. 7.

First, in S21, the filtering unit 28 identifies a representative symbolin the score vector. In other words, the filtering unit 28 identifiesthe best score among a plurality of scores included in the score vector.Then, the filtering unit 28 identifies, as a representative symbol, asymbol corresponding to the identified best score.

Subsequently, in S22, the filtering unit 28 determines whether theidentified representative symbol is a recognition-target symbol. Inother words, the filtering unit 28 determines whether the acquired scorevector is the first score vector in which a representative symbol is arecognition-target symbol.

When the representative symbol is a recognition-target symbol, that is,when the acquired score vector is the first score vector (Yes in S22),the filtering unit 28 advances the processing to S25. Then, in S25, thefiltering unit 28 causes the acquired first score vector to pass throughto give the acquired first score vector to the searching unit 32 on asubsequent stage. The filtering unit 28 can thereby add a score vector(the first score vector) in which possibility that a recognition-targetsymbol is selected as a path is high, to a search target. As a result,the filtering unit 28 can maintain recognition accuracy in the searchingunit 32 on the subsequent stage.

In addition, when the representative symbol is not a recognition-targetsymbol, that is, when the representative symbol is a non-target symbol(No in S22), the filtering unit 28 advances the processing to S23.

In S23, the filtering unit 28 determines whether the score of therepresentative symbol is worse than the predefined first threshold. Inother words, in S23, the filtering unit 28 determines whether theacquired score vector is the second score vector in which therepresentative symbol is a non-target symbol, and the score of therepresentative symbol is worse than the first threshold.

When the score of the representative symbol is worse than the firstthreshold, that is, when the acquired score vector is the second scorevector (Yes in S23), the filtering unit 28 advances the processing toS25. Then, in S25, the filtering unit 28 causes the acquired secondscore vector to pass through to give the acquired second score vector tothe searching unit 32 on the subsequent stage. The filtering unit 28 canthereby add a score vector (the second score vector) in whichpossibility that a non-target symbol is selected as a path is lower thana predefined value, to a search target. As a result, the filtering unit28 can maintain recognition accuracy in the searching unit 32 on thesubsequent stage.

In addition, when the score of the representative symbol is equal to thefirst threshold or better than the first threshold (No in S23), thefiltering unit 28 advances the processing to S24. In other words, whenthe acquired score vector is the third score vector in which therepresentative symbol is a non-target symbol, and the score of therepresentative symbol is equal to the first threshold or better than thefirst threshold, the filtering unit 28 advances the processing to S24.

In S24, the filtering unit 28 determines whether the acquired scorevector (i.e., the third score vector) satisfies the predefined firstcondition. For example, the first condition is a condition fordetermining whether an input signal can be recognized more accurately ina case in which the acquired score vector is included in the scorevector sequence, than a case in which the acquired score vector is notincluded in the score vector sequence.

When the acquired third score vector satisfies the first condition (Yesin S24), the filtering unit 28 advances the processing to S25. Then, inS25, the filtering unit 28 causes the acquired third score vectorsatisfying the first condition, to pass through to give the acquiredthird score vector to the searching unit 32 on the subsequent stage. Thefiltering unit 28 can thereby add a score vector (the third score vectorsatisfying the first condition) in which possibility that arecognition-target symbol is selected as a path is low, but possibilityof contributing to the maintenance of recognition accuracy is high, to asearch target. As a result, the filtering unit 28 can maintainrecognition accuracy in the searching unit 32 on the subsequent stage.

In addition, when the acquired third score vector does not satisfy thefirst condition (No in S24), the filtering unit 28 advances theprocessing to S26. Then, in S26, the filtering unit 28 deletes theacquired third score vector not satisfying the first condition, from thescore vector sequence. The filtering unit 28 can thereby exclude a scorevector (the third score vector not satisfying the first condition) inwhich possibility that a recognition-target symbol is selected as a pathis low, and furthermore, possibility of contributing to the maintenanceof recognition accuracy is also low, from a search target. As a result,the filtering unit 28 can reduce calculation costs in the searching unit32 on the subsequent stage.

Then, when the filtering unit 28 finishes the processing in S25 or S26,the filtering unit 28 repeats the processing from S21 for the next scorevector. In addition, the filtering unit 28 may execute the processes inS22, S23, and S24 in any order, and may collectively execute theprocesses.

FIG. 8 is a diagram illustrating a first example of a score vectorsequence obtainable before processing is performed by the filtering unit28, and a score vector sequence obtainable after the processing isperformed by the filtering unit 28. FIG. 8 illustrates a representativesymbol (symbol having the best score) included in each score vector. Inaddition, in FIG. 8, ε with an underscore represents a score vector inwhich a representative symbol is a non-target symbol, and the score ofthe representative symbol is equal to the first threshold or better thanthe first threshold. In other words, ε with an underscore represents thethird score vector. In addition, the same applies to FIG. 9.

The filtering unit 28 determines one or more and K−1 or less third scorevectors of consecutive K (K is an integer of two or more) third scorevectors as the third score vector satisfying the first condition. Then,among the consecutive K third score vectors, the filtering unit 28causes the third score vector satisfying the first condition, to passthrough, and deletes the third score vector not satisfying the firstcondition.

For example, the filtering unit 28 determines one third score vector ofthe consecutive K third score vectors as the third score vectorsatisfying the first condition. Then, among the consecutive K thirdscore vectors, the filtering unit 28 causes the one third score vectorsatisfying the first condition, to pass through, and deletes the thirdscore vector not satisfying the first condition.

For example, in the example illustrated in FIG. 8, the fifth to seventhscore vectors constitute consecutive three third score vectors. In theexample illustrated in FIG. 8, among the fifth to seventh consecutivethree score vectors, the filtering unit 28 causes a leading one scorevector (the fifth score vector) to pass through to a subsequent stage,as the third score vector satisfying the first condition. Then, thefiltering unit 28 deletes the score vectors other than the leading onescore vector (the sixth and seventh score vectors) as the third scorevectors not satisfying the first condition.

In addition, for example, in the example illustrated in FIG. 8, theninth to 13th score vectors constitute consecutive five third scorevectors. In the example illustrated in FIG. 8, among the ninth to 13thconsecutive five score vectors, the filtering unit 28 causes a leadingone score vector (the ninth score vector) to pass through to asubsequent stage, as the third score vector satisfying the firstcondition. Then, the filtering unit 28 deletes the score vectors otherthan the leading one score vector (the tenth to 13th score vectors) asthe third score vectors not satisfying the first condition.

In this manner, when consecutive K third score vectors are included, thefiltering unit 28 can certainly give one third score vector to thesearching unit 32.

When two identical recognition-target symbols are consecutivelyarranged, the searching unit 32 executes recognition processing bycombining these two recognition-target symbols into one (e.g., theprocessing illustrated in FIG. 5). Thus, if all non-target symbolsexisting between the two identical recognition-target symbols aredeleted, the searching unit 32 misrecognizes, as one recognition-targetsymbol, the two recognition-target symbols to be originally recognizedas separate symbols.

For avoiding such misrecognition, the filtering unit 28 gives at leastone non-target symbol of the consecutive K third score vectors to thesearching unit 32. The searching unit 32 can thereby separatelyrecognize each of the two recognition-target symbols without executingthe processing of combing two recognition-target symbols into one.

In this manner, because the filtering unit 28 leaves at least onenon-target symbol of non-target symbols existing between tworecognition-target symbols, the filtering unit 28 can avoidmisrecognition. Meanwhile, when K (K is an integer of two or more)recognition-target symbols are consecutively arranged, the filteringunit 28 deletes at least one or more recognition-target symbols. Thiscan also reduce costs of calculation performed by the searching unit 32.

In addition, the filtering unit 28 may cause a score vector other than aleading score vector among consecutive two or more third score vectors,to pass through. In addition, the filtering unit 28 may cause any numberof third score vectors to pass through as long as the number is one ormore and K−1 or less.

FIG. 9 is a diagram illustrating a second example of a score vectorsequence obtainable before processing is performed by the filtering unit28, and a score vector sequence obtainable after the processing isperformed by the filtering unit 28. A sequence constituted byconsecutive K (K is an integer of two or more) third score vectors isassumed to be a partial vector sequence.

When a representative symbol included in an immediately preceding scorevector of a partial vector sequence, and a representative symbolincluded in an immediately following score vector of the partial vectorsequence are identical, the filtering unit 28 determines one or more andK−1 or less third score vectors of the partial vector sequence as thethird score vector satisfying the first condition. Then, among thepartial vector sequence, the filtering unit 28 causes the third scorevector satisfying the first condition, to pass through, and deletes thethird score vector not satisfying the first condition.

For example, in the example illustrated in FIG. 9, the fifth to seventhscore vectors constitute a partial vector sequence. A representativesymbol included in an immediately preceding score vector (the fourthscore vector) of the partial vector sequence including the fifth toseventh score vectors is “g”. In addition, a representative symbolincluded in an immediately following score vector (the eighth scorevector) of the partial vector sequence including the fifth to seventhscore vectors is “o”. In other words, the representative symbol includedin the immediately preceding score vector of the partial vector sequenceincluding the fifth to seventh score vectors, and the representativesymbol included in the immediately following score vector of the partialvector sequence are not identical. Thus, in the example illustrated inFIG. 9, the filtering unit 28 deletes all the three score vectorsconstituting the partial vector sequence including the fifth to seventhscore vectors, as the third score vectors not satisfying the firstcondition.

In addition, for example, in the example illustrated in FIG. 9, theninth to 13th score vectors constitute a partial vector sequence. Arepresentative symbol included in an immediately preceding score vector(the eighth score vector) of the partial vector sequence including theninth to 13th score vectors is “o”. In addition, a representative symbolincluded in an immediately following score vector (the 14th scorevector) of the partial vector sequence including the ninth to 13th scorevectors is “o”. In other words, the representative symbol included inthe immediately preceding score vector of the partial vector sequenceincluding the ninth to 13th score vectors, and the representative symbolincluded in the immediately following score vector of the partial vectorsequence are identical. Thus, in the example illustrated in FIG. 9,among the partial vector sequence including the ninth to 13th scorevectors, the filtering unit 28 causes a leading one score vector to passthrough to a subsequent stage, as the third score vector satisfying thefirst condition. Then, among the partial vector sequence including theninth to 13th score vectors, the filtering unit 28 deletes the scorevectors other than the leading one score vector as the third scorevectors not satisfying the first condition.

Even if the filtering unit 28 executes the processing in this manner,the filtering unit 28 can give, to the searching unit 32, at least onenon-target symbol of non-target symbols existing between two identicalrecognition-target symbols. The searching unit 32 can thereby separatelyrecognize each of the two recognition-target symbols without executingthe processing of combing two recognition-target symbols into one. Thus,the filtering unit 28 can avoid misrecognition.

In addition, the filtering unit 28 does not give, to the searching unit32, non-target symbols existing between two nonidenticalrecognition-target symbols. Accordingly, the searching unit 32 needs notexecute search processing of non-target symbols existing between twononidentical recognition-target symbols. Thus, the filtering unit 28 canfurther reduce calculation costs of search processing.

In addition, among a partial vector sequence, the filtering unit 28 maycause a score vector other than a leading score vector to pass through.In addition, among the partial vector sequence, the filtering unit 28may cause any number of score vectors to pass through as long as thenumber is one or more and K−1 or less.

FIG. 10 is a diagram illustrating an example of a pseudo-coderepresenting recognition processing performed by the recognitionapparatus 10. As an example, the recognition apparatus 10 executes thepseudo-code illustrated in FIG. 10, sequentially from the first line.

On the first line, the recognition apparatus 10 substitutes ξ_(initial)into ξ, 0 into η, and ε into σ.

In ξ, a plurality of symbol sequences being searched for, andcorresponding output symbols are stored. For example, in ξ, a path of aWFST retrieved based on the Viterbi algorithm may be stored. ξ_(initial)represents an initial state of ξ. By executing the first line, therecognition apparatus 10 can initialize ξ.

η is a variable into which any of 0, 1, and 2 is to be substituted. Morespecifically, when 0 is substituted thereinto, η indicates that arepresentative symbol of the ith frame is a non-target symbol, andsearch processing has been executed for the ith frame. When 1 issubstituted thereinto, η indicates that a representative symbol of theith frame is a recognition-target symbol, and search processing has beenexecuted. In addition, when the representative symbol is arecognition-target symbol, the recognition apparatus 10 always executesthe search processing for the frame. In addition, when 2 is substitutedthereinto, η indicates that a representative symbol of the ith frame isa non-target symbol, representative symbols in frames from the nextframe of the last frame in which a representative symbol is arecognition-target symbol, to the ith frame are all non-target symbols,and search processing has not been executed.

σ is a variable in which a representative symbol of the current frame isstored. ε represents a non-target symbol. By executing the first line,the recognition apparatus 10 can substitute initial values into η and σ.

The second line indicates that integers from 1 to N are sequentiallysubstituted into i, and processing on the third to 20th lines isrepeated each time an integer is substituted into i. i is a variable. Nrepresents a total number of frames of an input signal. The recognitionapparatus 10 executes the processing on the third to 20th lines for eachof the first frame to the Nth frame of the input signal.

On the third line, the recognition apparatus 10 substitutes a processingresult of extract_features(f_(i)) into v. v is a variable in which afeature vector is stored. f_(i) represents an input signal of the ithframe.

extract_features(f_(i)) represents a function for calculating a featurevector from the input signal of the ith frame. By executing the thirdline, the recognition apparatus 10 can calculate the feature vector ofthe ith frame.

On the fourth line, the recognition apparatus 10 substitutescalc_scores(v) into s. s is a variable in which a score vector isstored. calc_scores(v) is a function for calculating a score vector froma feature vector. By executing the fourth line, the recognitionapparatus 10 can calculate the score vector of the ith frame.

On the fifth line, the recognition apparatus 10 substitutes σ intoσ_(prev). σ_(prev) is a variable in which a representative symbol of animmediately preceding frame (the (i−1)th frame) is stored. In addition,when the current frame is the first frame, ε being an initial value issubstituted into a σ_(prev).

On the sixth line, the recognition apparatus 10 substitutes arepresentative symbol of the ith frame into σ. In addition, a functionin which “a∈Σ” is added below “argmax s[a]” indicated on the sixth lineis a function for acquiring a representative symbol. The function inwhich “a∈Σ” is added below “argmax s[a]” is a function for acquiring asymbol having the largest s[a] among symbols included in Σ. Σ representsa symbol set. s[a] represents a score corresponding to “a” among thescore vectors of the ith frame. By executing the function, therecognition apparatus 10 can acquire a symbol corresponding to thelargest score, from the score vectors of the ith frame.

In addition, in the pseudo-code illustrated in FIG. 10, a larger valueof a score included in a score vector indicates a better score. If asmaller value of the score indicates a better score, on the sixth line,the recognition apparatus 10 may execute a function in which “a∈Σ” isadded below “argmin s[a]”. By executing the function, the recognitionapparatus 10 can acquire a symbol corresponding to the smallest score,from the score vectors of the ith frame.

On the seventh line, the recognition apparatus 10 determines whetherσ=ε0 is satisfied. In other words, the recognition apparatus 10determines whether the representative symbol of the ith frame is anon-target symbol. When the representative symbol of the ith frame is anon-target symbol, the recognition apparatus 10 executes the eighth to15th lines. In addition, when the representative symbol of the ith frameis not a non-target symbol, that is, when the representative symbol ofthe ith frame is a recognition-target symbol, the recognition apparatus10 executes the 17th to 20th lines.

On the eighth line, the recognition apparatus 10 determines whethers[ε]<θ is satisfied. s[ε] represents the score of the non-target symbolof the ith frame, that is, the score of the representative symbol of theith frame. θ represents a predefined first threshold. In other words,the recognition apparatus 10 determines whether the score of therepresentative symbol is smaller than the predefined first threshold.The recognition apparatus 10 can thereby determine whether the score ofthe representative symbol is worse than the first threshold. Inaddition, when a smaller value of the score indicates a better score, onthe eighth line, the recognition apparatus 10 can determine the score bydetermining whether s[ε]>θ is satisfied.

When the score of the representative symbol of the ith frame is smallerthan the first threshold, that is, the score of the representativesymbol of the ith frame is worse than the first threshold, therecognition apparatus 10 executes the ninth to tenth lines. When thescore of the representative symbol of the ith frame is not smaller thanthe first threshold, that is, the score of the representative symbol ofthe ith frame is equal to the first threshold or better than the firstthreshold, the recognition apparatus 10 executes the 12th to 15th lines.

On the ninth line, the recognition apparatus 10 substitutes a processingresult of search(ξ,s) into ξ. search(ξ,s) is a function for acquiring asearch result of a symbol sequence and an output symbol from a scorevector sequence to which the score vectors of the ith frame are added.By executing the ninth line, the recognition apparatus 10 can generate asymbol sequence and an output symbol at a stage where searching offrames up to the ith frame has been finished. When the path of a WFST issearched for based on the Viterbi algorithm, the recognition apparatus10 may extend the path of the WFST by new one score vector, and storethe resultant path into ξ as a processing result.

On the tenth line, the recognition apparatus 10 substitutes 0 into η. ηcan thereby indicate that the representative symbol of the ith frame isa non-target symbol, and search processing has been executed for the ithframe.

When the recognition apparatus 10 finishes the tenth line, therecognition apparatus 10 finishes the processing for the ith frame, andexecutes the processing from the third line for the next frame.

On the 12th line, the recognition apparatus 10 determines whether η=1 issatisfied. In other words, the recognition apparatus 10 determineswhether a representative symbol of the immediately preceding frame (the(i−1)th frame) is a recognition-target symbol.

When the representative symbol of the immediately preceding frame is arecognition-target symbol, the recognition apparatus 10 executes the13th to 15th lines.

On the 13th line, the recognition apparatus 10 substitutes s into s,s_(ε). s_(ε) is a variable in which a score vector is stored. Byexecuting the 13th line, the recognition apparatus 10 stores the scorevector of the ith frame into s_(ε). The recognition apparatus 10 canthereby store a score vector of the next frame of the frame in which arepresentative symbol is a recognition-target symbol, into s_(ε).

On the 14th line, the recognition apparatus 10 substitutes σ_(prev) intor. r is a variable in which a symbol is stored. By executing the 14thline, the recognition apparatus 10 stores, into r, the representativesymbol of the immediately preceding frame (the (i−1)th frame). Therecognition apparatus 10 can thereby store, into r, the representativesymbol in the last frame in which the representative symbol is arecognition-target symbol.

On the 15th line, the recognition apparatus 10 substitutes 2 into η. ηcan thereby indicate that representative symbols in frames from the nextframe of the last frame in which the representative symbol is arecognition-target symbol, to the ith frame are all non-target symbols,and search processing has not been executed.

When the recognition apparatus 10 finishes the 15th line, therecognition apparatus 10 finishes the processing for the ith frame, andexecutes the processing from the third line for the next frame.

In addition, on the 12th line, when the recognition apparatus 10determines that the representative symbol of the immediately precedingframe is not a recognition-target symbol, that is, when therepresentative symbol of the ith frame is a non-target symbol and therepresentative symbol of the immediately preceding frame is a non-targetsymbol, the recognition apparatus 10 finishes the processing for the ithframe, and executes the processing from the third line for the nextframe. The recognition apparatus 10 can thereby advance the processingto the next frame without executing search processing for the ith frame.

On the other hand, on the 17th line, the recognition apparatus 10determines whether η=2 and r=σ are satisfied. By determining whether η=2is satisfied, the recognition apparatus 10 can determine whetherrepresentative symbols in all the frames from the next frame of the lastframe in which the representative symbol is a recognition-target symbol,to the ith frame are non-target symbols, and search processing has notbeen executed. In addition, by determining whether r=σ is satisfied, therecognition apparatus 10 can determine whether the representative symbolin the last frame in which the representative symbol is arecognition-target symbol, and the representative symbol in the ithframe are identical.

When η=2 and r=σ are satisfied, the recognition apparatus 10 executesthe 18th line.

On the 18th line, the recognition apparatus 10 substitutes a processingresult of search(ξ,s_(ε)) into ξ. By executing the 18th line, therecognition apparatus 10 can add a score vector of the next frame of thelast frame in which the representative symbol is a recognition-targetsymbol, to the score vector sequence, and acquire a search result of asymbol sequence and an output symbol. In other words, by executing the18th line, the recognition apparatus 10 can add a score vector of aframe in which a representative symbol is a non-target symbol, to thescore vector sequence, and acquire a search result of a symbol sequenceand an output symbol. The recognition apparatus 10 can thereby avoidprocessing of collectively recognizing identical recognition-targetsymbols as one, and maintain recognition accuracy.

On the 19th line, the recognition apparatus 10 substitutes a processingresult of search(ξ,s) into ξ. By executing the 19th line, therecognition apparatus 10 can add a score vector of the ith frame to thescore vector sequence, and acquire a search result of a symbol sequenceand an output symbol.

On the 20th line, the recognition apparatus 10 substitutes 1 into η. ηcan thereby indicate that the representative symbol of the ith frame isa recognition-target symbol, and search processing has been executed.When the recognition apparatus 10 finishes the 20th line, therecognition apparatus 10 finishes the processing for the ith frame, andexecutes the processing from the third line for the next frame.

Then, on the 21st line, the recognition apparatus 10 returns aprocessing result of result(ξ) for acquiring an output symbol byreferring to ξ, to a program of an invoke of this pseudo-code. Therecognition apparatus 10 can thereby output an output symbol.

As described above, by executing the processing according to thepseudo-code illustrated in FIG. 10, when a representative symbolincluded in an immediately preceding score vector of a partial vectorsequence constituted by consecutive K third score vectors, and arepresentative symbol included in an immediately following score vectorof the partial vector sequence are identical, the recognition apparatus10 can determine a leading third score vector of the partial vectorsequence as the third score vector satisfying the first condition.

In other words, the recognition apparatus 10 can search for the firstscore vector and the second score vector among the score vectorsequence. Furthermore, among the score vector sequence, when arepresentative symbol included in an immediately preceding score vectorof a partial vector sequence constituted by consecutive K or more thirdscore vectors, and a representative symbol included in an immediatelyfollowing score vector of the partial vector sequence are identical, therecognition apparatus 10 can search for a leading third score vector ofthe partial vector sequence, and skip the searching of third scorevectors other than the leading third score vector of the partial vectorsequence.

The recognition apparatus 10 can thereby perform pattern recognition ofan input signal with small calculation costs while maintainingrecognition accuracy.

In addition, the 17th line in the pseudo-code illustrated in FIG. 10 maybe replaced by “if η=2”. If a pseudo-code changed in this manner isexecuted, when a partial vector sequence constituted by consecutive Kthird score vectors is included, the recognition apparatus 10 candetermine a leading third score vector of the partial vector sequence asthe third score vector satisfying the first condition.

In other words, the recognition apparatus 10 can search for the firstscore vector and the second score vector among the score vectorsequence. Furthermore, among the score vector sequence, the recognitionapparatus 10 can search for a leading third score vector of a partialvector sequence constituted by consecutive two or more third scorevectors, and skip the searching of third score vectors other than theleading third score vector of the partial vector sequence.

In addition, the pseudo-code illustrated in FIG. 10 indicates an exampleof searching for a leading one score vector of a partial vectorsequence, and skipping the searching of remaining score vectors.Nevertheless, the recognition apparatus 10 may search for a score vectorother than the leading score vector of the partial vector sequence, ormay search for one or more score vectors of the partial vector sequence.

Modified Example

For example, when a search model is a WFST, the searching unit 32 cansearch for a plurality of symbol paths. In such a case, the filteringunit 28 may perform the following processing.

When a symbol having the k_(p)th (k_(p) is an integer of one ormore)—best score that is included in an immediately preceding scorevector of a partial vector sequence constituted by consecutive K (K isan integer of two or more) third score vectors, and a symbol having thek_(n)th (k_(n) is an integer of one or more)—best score that is includedin an immediately following score vector of the partial vector sequenceare identical, the filtering unit 28 may determine one or more and K−1or less third score vectors of the partial vector sequence as the thirdscore vector satisfying the first condition. The recognition apparatus10 can thereby maintain recognition accuracy when the searching unit 32does not select a representative symbol in the immediately precedingscore vector of the partial vector sequence or the immediately followingscore vector of the partial vector sequence.

In addition, a set of symbols having top k_(p) (k_(p) is an integer ofone or more) scores that is included in an immediately preceding scorevector of a partial vector sequence constituted by consecutive K (K isan integer of two or more) third score vectors, and a set of symbolshaving top k_(p) scores that is included in an immediately followingscore vector of the partial vector sequence are identical, the filteringunit 28 determines one or more and K−1 or less third score vectors ofthe partial vector sequence as the third score vector satisfying thefirst condition. Even in this case, the recognition apparatus 10 canmaintain recognition accuracy when the searching unit 32 does not selecta representative symbol in the immediately preceding score vector of thepartial vector sequence or the immediately following score vector of thepartial vector sequence.

In addition, when at least any one of symbols having top k_(p) (k_(p) isan integer of one or more) scores that is included in an immediatelypreceding score vector of a partial vector sequence constituted byconsecutive K (K is an integer of two or more) third score vectors isidentical to any one of symbols having top k_(n) (k_(n) is an integer ofone or more) scores that are included in an immediately following scorevector of the partial vector sequence, the filtering unit 28 determinesone or more and K−1 or less third score vectors of the partial vectorsequence as the third score vector satisfying the first condition. Therecognition apparatus 10 can thereby maintain recognition accuracy whenthere is any possibility that the searching unit 32 selects identicalsymbols in the immediately preceding score vector of the partial vectorsequence and the immediately following score vector of the partialvector sequence.

FIG. 11 is a hardware block diagram of the recognition apparatus 10. Asan example, the recognition apparatus 10 is realized by a hardwareconfiguration similar to that of a general computer (informationprocessing apparatus). The recognition apparatus 10 includes a centralprocessing unit (CPU) 101, an operation unit 102, a display 103, amicrophone 104, a read only memory (ROM) 105, a random access memory(RAM) 106, a storage 107, a communication device 108, and a bus 109. Theunits are connected by the bus 109.

The CPU 101 uses a predetermined area of the RAM 106 as a work area, andexecutes various types of processing in cooperation with variousprograms prestored in the ROM 105 or the storage 107, to comprehensivelycontrol operations of the units (the feature extractor 22, the scorecalculator 26, the filtering unit 28, and the searching unit 32)constituting the recognition apparatus 10. In addition, the CPU 101controls the operation unit 102, the display 103, the microphone 104,the communication device 108, and the like, in cooperation with programsprestored in the ROM 105 or the storage 107.

The operation unit 102 is an input device such as a mouse and akeyboard, and receives, as an instruction signal, information operatedand input by a user, and outputs the instruction signal to the CPU 101.

The display 103 is a display device such as a liquid crystal display(LCD). The display 103 displays various types of information based on adisplay signal from the CPU 101. For example, the display 103 displaysan output symbol or the like that is output by the searching unit 32. Inaddition, when an output symbol or the like is output to thecommunication device 108, the storage 107, or the like, the recognitionapparatus 10 needs not include the display 103.

The microphone 104 is a device for inputting a speech signal. Whenpattern recognition of a prerecorded speech signal or a speech signalinput from the communication device 108 is performed, or when patternrecognition of an input signal having a type other than speech isperformed, the recognition apparatus 10 needs not include the microphone104.

The ROM 105 stores programs and various types of setting informationthat are used for the control of the recognition apparatus 10, in anunwritable manner. The RAM 106 is a volatile storage medium such as asynchronous dynamic random access memory (SDRAM). The RAM 106 functionsas a work area of the CPU 101. More specifically, the RAM 106 functionsas a buffer or the like that temporarily stores various variables,parameters, and the like that are used by the recognition apparatus 10.

The storage 107 is a rewritable recording device such as a semiconductorstorage medium including a flash memory, and a magnetically oroptically-recordable storage medium. The storage 107 stores programs andvarious types of setting information that are used for the control ofthe recognition apparatus 10. In addition, the storage 107 storesinformation stored in the pattern recognition model storage 24, thesearch model storage 30, and the like.

The communication device 108 communicates with an external device to beused for the output or the like of an output symbol or the like. Whenpattern recognition of a prerecorded speech signal or a speech signalinput from the microphone 104 is performed, and when an output symbol orthe like is output to the display 103 or the storage 107, therecognition apparatus 10 needs not include the communication device 108.

In addition, pattern recognition of a handwritten character isperformed, the recognition apparatus 10 further includes a handwritinginput device. In addition, when OCR is performed, the recognitionapparatus 10 further includes a scanner, a camera, or the like. Inaddition, when gesture recognition, recognition of a hand signal, orsign-language recognition is performed, the recognition apparatus 10further includes a video camera for inputting a moving image signal.When pattern recognition of any of these types that does not use voiceis performed, the recognition apparatus 10 needs not include themicrophone 104.

Programs executed by the recognition apparatus 10 according to thepresent embodiment are provided with being recorded on acomputer-readable recording medium such as a CD-ROM, a flexible disk(FD), a CD-R, and a digital versatile disk (DVD), in files having aninstallable format or an executable format.

In addition, programs executed by the recognition apparatus 10 accordingto the present embodiment may be stored in a computer connected to anetwork such as the Internet, and provided by being downloaded via thenetwork. In addition, programs executed by the recognition apparatus 10according to the present embodiment may be provided or delivered via anetwork such as the Internet. In addition, programs executed by therecognition apparatus 10 according to the present embodiment may beprovided with being preinstalled on a ROM or the like.

Programs executed by the recognition apparatus 10 according to thepresent embodiment have a module configuration including a featureextraction module, a score calculation module, a filtering module, and asearching module. By the CPU 101 (processor) reading the programs from astorage medium or the like and executing the programs, theabove-described units are loaded onto a main storage device, and thefeature extractor 22, the score calculator 26, the filtering unit 28,and the searching unit 32 are generated on the main storage device. Inaddition, a part or all of the feature extractor 22, the scorecalculator 26, the filtering unit 28, and the searching unit 32 may beformed by hardware.

While certain embodiments have been described, these embodiments havebeen presented by way of example only, and are not intended to limit thescope of the inventions. Indeed, the novel embodiment described hereinmay be embodied in a variety of other forms; furthermore, variousomissions, substitutions and changes in the form of the embodimentdescribed herein may be made without departing from the spirit of theinventions. The accompanying claims and their equivalents are intendedto cover such forms or modifications as would fall within the scope andspirit of the inventions.

What is claimed is:
 1. A recognition apparatus for performing patternrecognition of an input signal being a recognition target, therecognition apparatus comprising: one or more hardware processorsconfigured to: calculate, based on the input signal, a score vectorsequence in which a plurality of score vectors each including respectivescores of symbols are arranged; and cause a partial score vector of thecalculated score vector sequence to pass through to filter the scorevector sequence, wherein the one or more hardware processors areconfigured to cause, among: a first score vector in which arepresentative symbol is a recognition-target symbol, the representativesymbol being a symbol corresponding to a best score among the scoresincluded in the first score vector; a second score vector in which arepresentative symbol is a non-target symbol, and a score of therepresentative symbol is worse than a first threshold, the non-targetsymbol being a symbol representing that it is undetermined which pieceof information among information pieces represented byrecognition-target symbols is included in the input signal; and a thirdscore vector in which a representative symbol is a non-target symbol,and a score of the representative symbol is equal to the first thresholdor better than the first threshold, a third score vector satisfying apredefined first condition, to pass through to filter the score vectorsequence.
 2. The apparatus according to claim 1, wherein the one or moreprocessors are configured to delete, among the third score vector, athird score vector not satisfying the predefined first condition tofilter the score vector sequence.
 3. The apparatus according to claim 1,wherein the one or more processors are further configured to generate apattern recognition result of the input signal by searching for a symbolpath following likely scores in the filtered score vector sequence. 4.The apparatus according to claim 3, wherein the one or more processorsare configured to merge recognition-target symbols that are identicaland consecutively arranged on a path, into one to generate the patternrecognition result of the input signal.
 5. The apparatus according toclaim 4, wherein the one or more processors are configured to merge therecognition-target symbols that are identical and consecutively arrangedon the path, into one, and then, exclude the non-target symbol on thepath to generate the pattern recognition result of the input signal. 6.The apparatus according to claim 1, wherein the one or more processorsare configured to determine one or more and K−1 or less third scorevectors of consecutive K (K is an integer of two or more) third scorevectors as the third score vector satisfying the predefined firstcondition.
 7. The apparatus according to claim 6, wherein the one ormore processors are configured to determine one third score vector ofthe consecutive K (K is an integer of two or more) third score vectorsas the third score vector satisfying the predefined first condition. 8.The apparatus according to claim 1, wherein, when a representativesymbol included in an immediately preceding score vector of a partialvector sequence constituted by consecutive K (K is an integer of two ormore) third score vectors, and a representative symbol included in animmediately following score vector of the partial vector sequence areidentical, the one or more processors are configured to determine one ormore and K−1 or less third score vectors of the partial vector sequenceas the third score vector satisfying the predefined first condition. 9.The apparatus according to claim 1, wherein, when a set of symbolshaving top k_(p) (k_(p) is an integer of one or more) scores that areincluded in an immediately preceding score vector of a partial vectorsequence constituted by consecutive K (K is an integer of two or more)third score vectors, and a set of symbols having top k_(p) scores thatare included in an immediately following score vector of the partialvector sequence are identical, the one or more processors are configuredto determine one or more and K−1 or less third score vectors of thepartial vector sequence as the third score vector satisfying thepredefined first condition.
 10. The apparatus according to claim 1,wherein, when at least any one of symbols having top k_(p) (k_(p) is aninteger of one or more) scores that are included in an immediatelypreceding score vector of a partial vector sequence constituted byconsecutive K (K is an integer of two or more) third score vectors isidentical to any one of symbols having top k_(n) (k_(n) is an integer ofone or more) scores that are included in an immediately following scorevector of the partial vector sequence, the one or more processors areconfigured to determine one or more and K−1 or less third score vectorsof the partial vector sequence as the third score vector satisfying thepredefined first condition.
 11. The apparatus according to claim 1,wherein the one or more processors are configured to calculate the scorevectors using a recurrent neural network.
 12. The apparatus according toclaim 11, wherein the one or more processors are configured to calculatea softmax function for outputting the respective scores of the symbolsin an output layer of the recurrent neural network.
 13. The apparatusaccording to claim 1, wherein the input signal is a speech signal.
 14. Arecognition method for performing pattern recognition of an input signalbeing a recognition target, the recognition method comprising:calculating, based on the input signal, a score vector sequence in whicha plurality of score vectors each including respective scores of symbolsare arranged; and causing a partial score vector of the calculated scorevector sequence to pass through to filter the score vector sequence,wherein at the causing, among: a first score vector in which arepresentative symbol is a recognition-target symbol, the representativesymbol being a symbol corresponding to a best score among the scoresincluded in the first score vector; a second score vector in which arepresentative symbol is a non-target symbol, and a score of therepresentative symbol is worse than a first threshold, the non-targetsymbol being a symbol representing that it is undetermined which pieceof information among information pieces represented byrecognition-target symbols is included in the input signal; and a thirdscore vector in which a representative symbol is a non-target symbol,and a score of the representative symbol is equal to the first thresholdor better than the first threshold, a third score vector satisfying apredefined first condition is made to pass through to filter the scorevector sequence.
 15. A computer program product comprising anon-transitory computer-readable medium including programmedinstructions, the instructions causing a computer to function as arecognition apparatus for performing pattern recognition of an inputsignal being a recognition target, the instructions causing the computerto function as: a calculator configured to calculate, based on the inputsignal, a score vector sequence in which a plurality of score vectorseach including respective scores of symbols are arranged; and afiltering unit configured to cause a partial score vector of thecalculated score vector sequence to pass through, wherein the filteringunit causes, among: a first score vector in which a representativesymbol is a recognition-target symbol, the representative symbol being asymbol corresponding to a best score among the scores included in thefirst score vector; a second score vector in which a representativesymbol is a non-target symbol, and a score of the representative symbolis worse than a first threshold, the non-target symbol being a symbolrepresenting that it is undetermined which piece of information amonginformation pieces represented by recognition-target symbols is includedin the input signal; and a third score vector in which a representativesymbol is a non-target symbol, and a score of the representative symbolis equal to the first threshold or better than the first threshold, athird score vector satisfying a predefined first condition, to passthrough.